fix: Support both reference.dict and reference.fasta.dict in GRIDSS wrappers #4426

RitzelC · 2025-08-19T16:13:26Z

Reference setup has changed to produce reference.fasta.dict instead of reference.dict.

Here I have changed it to just check the existance of the given dictionary without asuming a fixed ending like .dict.

Summary by CodeRabbit

Refactor
- Standardized input naming in assembly workflows for consistency.
Bug Fixes
- Prevents incorrect derivation of reference dictionary paths by using the provided dictionary input, avoiding mismatches in processing.
- Updated command invocation to match renamed assembly inputs.
Documentation
- Clarified missing/dictionary error messages to aid troubleshooting.

Reference setup has changed to produce reference.fasta.dict instead of reference.dict. Here I have changed it to just check the existance of the given dictionary without asuming a fixed ending like .dict.

coderabbitai · 2025-08-19T16:13:34Z

📝 Walkthrough

Walkthrough

Renamed Snakemake inputs in the assemble rule (bams → bam, bais → bai). Assemble wrapper updated to use snakemake.input.bam and to derive dictionary as reference + ".dict". Call and preprocess wrappers stop auto-deriving the dictionary and now expect/validate a provided dictionary path with updated error text.

Changes

Cohort / File(s)	Summary
Assemble rule & wrapper `bio/gridss/assemble/test/Snakefile`, `bio/gridss/assemble/wrapper.py`	Rule inputs renamed `bams`→`bam`, `bais`→`bai`. Wrapper command now uses `snakemake.input.bam`. Dictionary derivation changed to `reference + ".dict"`. Missing-dictionary message text adjusted.
Call wrapper dictionary sourcing `bio/gridss/call/wrapper.py`	Removed automatic dictionary derivation; dictionary sourced from `snakemake.input.get("dictionary")` and used as-is. Missing-dictionary error message updated to reference the provided path directly.
Preprocess wrapper dictionary handling `bio/gridss/preprocess/wrapper.py`	Removed auto-derivation of dictionary (comment added explaining filename mismatch). Missing-dictionary error message updated; existence check retained.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Snakemake
    participant AssembleWrapper as assemble/wrapper.py
    participant CallWrapper as call/wrapper.py
    participant PreprocessWrapper as preprocess/wrapper.py
    participant FS as FileSystem

    Note over Snakemake,AssembleWrapper: Assemble rule invokes wrapper with renamed inputs
    Snakemake->>AssembleWrapper: pass inputs (bam, bai, reference, dictionary?)
    AssembleWrapper->>FS: check dictionary path (reference + ".dict" or provided)
    FS-->>AssembleWrapper: exists / missing
    alt dictionary exists
        AssembleWrapper->>FS: read BAM via snakemake.input.bam
        AssembleWrapper->>Snakemake: execute GridSS with {snakemake.input.bam}
    else missing
        AssembleWrapper-->>Snakemake: raise error "Dictionary ... missing"
    end

    Note over CallWrapper,PreprocessWrapper: Both now validate a provided dictionary path
    Snakemake->>CallWrapper: pass reference, dictionary
    CallWrapper->>FS: check provided dictionary
    Snakemake->>PreprocessWrapper: pass reference, dictionary
    PreprocessWrapper->>FS: check provided dictionary (error message updated)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

bio/gridss/assemble/wrapper.py (1)
34-41: Do not derive the dictionary from the reference; use the provided input dictionary and fix the error message

Deriving the dictionary with reference + ".dict" contradicts the PR objective and breaks cases where the dictionary is reference.dict. Also, the current error message appends “.dict” to {dictionary}, leading to “.dict.dict” in output.

Apply this diff to accept arbitrary provided dictionary paths and improve validation:
-dictionary = reference + ".dict"
-if not path.exists(dictionary):
-    raise ValueError(
-        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
-            dictionary=dictionary
-        )
-    )
+dictionary = snakemake.input.get("dictionary")
+if not dictionary:
+    raise ValueError("Please set input.dictionary to provide reference dictionary.")
+if not path.exists(dictionary):
+    raise ValueError(
+        "Dictionary {dictionary} missing. Ensure the reference dictionary exists (e.g., reference.dict or reference.fasta.dict). You can create it with Picard CreateSequenceDictionary.".format(
+            dictionary=dictionary
+        )
+    )

🧹 Nitpick comments (3)

bio/gridss/preprocess/wrapper.py (1)
34-41: Error message implies a single naming scheme; make it neutral and accept both reference.dict and reference.fasta.dict

The current message suggests only reference.fa.dict. Since we now accept any provided dictionary path, the message should not prescribe a specific naming convention.

Apply this diff to clarify the guidance:
-        "Dictionary {dictionary} missing or wrong naming (reference.fa.dict). Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
+        "Dictionary {dictionary} missing. Ensure the reference dictionary exists (e.g., reference.dict or reference.fasta.dict). You can create it with Picard CreateSequenceDictionary.".format(
             dictionary=dictionary
         )
Additionally, consider validating presence of the input key to avoid a TypeError if input.dictionary is omitted:
# Insert after line 19 (after retrieving `dictionary`)
if not dictionary:
    raise ValueError("Please set input.dictionary to provide reference dictionary.")
bio/gridss/call/wrapper.py (1)
34-40: Accept arbitrary dictionary paths; update error message accordingly

Good move sourcing the dictionary from Snakemake input. The error message should not append “.dict” or imply a suffix; make it neutral to support both reference.dict and reference.fasta.dict. Also, guard against missing input.dictionary to avoid a TypeError in path.exists(None).

Apply this diff to update the message:
-        "{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
+        "Dictionary {dictionary} missing. Ensure the reference dictionary exists (e.g., reference.dict or reference.fasta.dict). You can create it with Picard CreateSequenceDictionary.".format(
             dictionary=dictionary
         )
And add a presence check right after retrieving the dictionary (outside this hunk):
# Insert after line 19 (after retrieving `dictionary`)
if not dictionary:
    raise ValueError("Please set input.dictionary to provide reference dictionary.")
bio/gridss/assemble/wrapper.py (1)
48-48: Typo: “consistency”

Minor spelling fix in the comment.
-    "{snakemake.input.bam} "  # changed to bam for consistancy
+    "{snakemake.input.bam} "  # changed to bam for consistency

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 892744b and 7701822.

📒 Files selected for processing (4)

bio/gridss/assemble/test/Snakefile (1 hunks)
bio/gridss/assemble/wrapper.py (2 hunks)
bio/gridss/call/wrapper.py (1 hunks)
bio/gridss/preprocess/wrapper.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

⚙️ CodeRabbit Configuration File

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

bio/gridss/call/wrapper.py
bio/gridss/assemble/wrapper.py
bio/gridss/preprocess/wrapper.py

**/wrapper.py

⚙️ CodeRabbit Configuration File

Do not complain about use of undefined variable called snakemake.

Files:

bio/gridss/call/wrapper.py
bio/gridss/assemble/wrapper.py
bio/gridss/preprocess/wrapper.py

🧠 Learnings (2)

📚 Learning: 2025-06-04T06:32:20.090Z

Learnt from: rohan-ibn-tariq
PR: snakemake/snakemake-wrappers#4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.

Applied to files:

bio/gridss/call/wrapper.py

📚 Learning: 2024-11-26T14:59:03.678Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.

Applied to files:

bio/gridss/assemble/wrapper.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Summary

🔇 Additional comments (1)

bio/gridss/assemble/test/Snakefile (1)

36-37: LGTM: input keys renamed to singular while still providing lists

Using singular keys (bam/bai) while supplying lists via expand is valid in Snakemake and aligns with the updated wrapper usage.

bio/gridss/assemble/wrapper.py

bio/gridss/preprocess/wrapper.py

bio/gridss/assemble/wrapper.py

bio/gridss/call/wrapper.py

bio/gridss/preprocess/wrapper.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

bio/gridss/assemble/wrapper.py (1)

37-37: Good: clearer error message with f-string

The updated error text is clear and actionable, and the f-string improves readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2819e1c and 6605296.

📒 Files selected for processing (3)

bio/gridss/assemble/wrapper.py (2 hunks)
bio/gridss/call/wrapper.py (1 hunks)
bio/gridss/preprocess/wrapper.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

bio/gridss/call/wrapper.py
bio/gridss/preprocess/wrapper.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

⚙️ CodeRabbit Configuration File

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

bio/gridss/assemble/wrapper.py

**/wrapper.py

⚙️ CodeRabbit Configuration File

Do not complain about use of undefined variable called snakemake.

Files:

bio/gridss/assemble/wrapper.py

🧠 Learnings (3)

📚 Learning: 2024-11-26T14:59:03.678Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.

Applied to files:

bio/gridss/assemble/wrapper.py

📚 Learning: 2024-11-26T08:31:00.099Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.

Applied to files:

bio/gridss/assemble/wrapper.py

📚 Learning: 2024-11-26T09:16:39.570Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.

Applied to files:

bio/gridss/assemble/wrapper.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: testing
GitHub Check: docs
GitHub Check: Summary

🔇 Additional comments (1)

bio/gridss/assemble/wrapper.py (1)

46-46: Fix leftover plural input references in call wrapper

The search uncovered a lingering reference to snakemake.input.bams in the call wrapper, which needs to be aligned with the singular snakemake.input.bam change:

• bio/gridss/call/wrapper.py:48
Replace
"{snakemake.input.bams} "
with
"{snakemake.input.bam} "
After updating, please rerun the pipeline/tests to ensure there are no more bams or bais usages.
⛔ Skipped due to learnings
Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3498
File: bio/ngscheckmate/ncm/wrapper.py:68-68
Timestamp: 2024-12-06T14:25:43.922Z
Learning: In the `bio/ngscheckmate/ncm/wrapper.py` file for the NGSCheckMate wrapper, do not suggest adding file existence checks for the list file input, as the files are already required by the wrapper.
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.

coderabbitai · 2025-08-20T07:12:52Z

bio/gridss/assemble/wrapper.py

+dictionary = reference + ".dict"
 if not path.exists(dictionary):
    raise ValueError(
-        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
-            dictionary=dictionary
-        )
+        f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."
    )


🛠️ Refactor suggestion

Support both reference.fa.dict and reference.dict (and an explicit input) to avoid regressions

The current check only accepts reference + ".dict". That breaks setups that still produce reference.dict, which this PR aims to support. Recommend trying both naming conventions and, if present, honoring an explicit snakemake.input.dictionary (aligning with call/preprocess wrappers).

Apply this diff to make the check robust and backward-compatible:

-dictionary = reference + ".dict" -if not path.exists(dictionary): - raise ValueError( - f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." - ) +dictionary = snakemake.input.get("dictionary") +if dictionary: + if not path.exists(dictionary): + raise ValueError( + f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." + ) +else: + candidates = [f"{reference}.dict", f"{path.splitext(reference)[0]}.dict"] + if not any(path.exists(d) for d in candidates): + raise ValueError( + "Reference dictionary not found. Looked for: {}. Please create it (e.g., with Picard CreateSequenceDictionary).".format( + ", ".join(candidates) + ) + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

dictionary = reference + ".dict"

if not path.exists(dictionary):

raise ValueError(

"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(

dictionary=dictionary

)

f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."

)

# before this block, reference is defined above

# replace the original dictionary existence check with:

dictionary = snakemake.input.get("dictionary")

if dictionary:

if not path.exists(dictionary):

raise ValueError(

f"{dictionary} missing. Please make sure the reference dictionary was properly created. "

"This can be accomplished for example by CreateSequenceDictionary.jar from Picard."

)

else:

candidates = [f"{reference}.dict", f"{path.splitext(reference)[0]}.dict"]

if not any(path.exists(d) for d in candidates):

raise ValueError(

"Reference dictionary not found. Looked for: {}. Please create it (e.g., with Picard CreateSequenceDictionary).".format(

", ".join(candidates)

)

)

🤖 Prompt for AI Agents

In bio/gridss/assemble/wrapper.py around lines 34 to 38, the current check only looks for reference + ".dict" which breaks setups that use reference.dict or provide an explicit snakemake.input.dictionary; update the logic to first prefer and honor an explicit snakemake.input.dictionary if present, then check for both reference + ".fa.dict" and reference + ".dict" (or other common variants) and use the first one found, raising the missing-file ValueError only if none of these exist; ensure the chosen dictionary path is assigned to the same variable used downstream.

fgvieira · 2025-09-30T10:31:52Z

bio/gridss/preprocess/wrapper.py

+# dictionary = path.splitext(reference)[0] + ".dict" --> leads to incorrect paths (reference.dict), which is incompatible with setup reference (reference.fa.dict)
 if not path.exists(dictionary):
    raise ValueError(
-        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
-            dictionary=dictionary
-        )
+        f"{dictionary} missing or wrong naming (reference.fa.dict). Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."


Suggested change

# dictionary = path.splitext(reference)[0] + ".dict" --> leads to incorrect paths (reference.dict), which is incompatible with setup reference (reference.fa.dict)

if not path.exists(dictionary):

raise ValueError(

"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(

dictionary=dictionary

)

f"{dictionary} missing or wrong naming (reference.fa.dict). Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."

fgvieira · 2025-09-30T10:33:16Z

bio/gridss/assemble/wrapper.py

+dictionary = reference + ".dict"
 if not path.exists(dictionary):
    raise ValueError(
-        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
-            dictionary=dictionary
-        )
+        f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."


Suggested change

dictionary = reference + ".dict"

if not path.exists(dictionary):

raise ValueError(

"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(

dictionary=dictionary

)

f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."

fgvieira · 2025-09-30T10:34:21Z

bio/gridss/call/wrapper.py

+#dictionary = path.splitext(reference)[0] + ".dict"
 if not path.exists(dictionary):
    raise ValueError(
-        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
-            dictionary=dictionary
-        )
+        f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."


Suggested change

#dictionary = path.splitext(reference)[0] + ".dict"

if not path.exists(dictionary):

raise ValueError(

"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(

dictionary=dictionary

)

f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."

fgvieira · 2025-09-30T10:57:19Z

bio/gridss/assemble/test/Snakefile

        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", *reference_index_endings),


Suggested change

reference="reference/genome.fasta",

dictionary="reference/genome.dict",

indices=multiext("reference/genome.fasta", *reference_index_endings),

ref="reference/genome.fasta",

dict="reference/genome.dict",

idx=multiext("reference/genome.fasta", *reference_index_endings),

fgvieira · 2025-09-30T10:57:39Z

bio/gridss/assemble/test/Snakefile

        assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR


Shouldn't this be a temporary directory?

Support both reference.dict and reference.fasta.dict in GRIDSS wrappers

7701822

Reference setup has changed to produce reference.fasta.dict instead of reference.dict. Here I have changed it to just check the existance of the given dictionary without asuming a fixed ending like .dict.

coderabbitai bot reviewed Aug 19, 2025

View reviewed changes